A speaker biased SI recognizer for embedded mobile applications
نویسندگان
چکیده
Non-native and accent speakers often face problems when using a speaker-independent (SI) speech recognition system. Speaker adaptation has been a solution to make SI recognizer work better for individuals. Targeting embedded implementation and applications in fast changing mobile environments, we in this paper proposed a supervised speaker adaptation (SA) solution with low system resource consumption, minimized disturbance to the data structure of SI recognizer, and superior adaptation performance. Adapted by UK speakers on a digit recognition task, the US English speech recognizer produced 65.9% digit error reduction. Other advantages of the proposed SA method include the multi-speaker adaptation, the fast adaptation, and the little changed speaker independency after adaptation.
منابع مشابه
Experiments and Analysis of Speaker Dependent Mandarin Monosyllable Recognition
It is well known that the word accuracy of a speaker independent (SI) continuous speech recognition system cannot be good enough for many real-world applications due to many interference factors in speech signal: pronunciation variance by speakers, different kinds of environment noise, and so on. Thus, analyzing the action procedure of each interference factor, then eliminating its effect as po...
متن کاملSpeaking rate normalization with lattice-based context-dependent phoneme duration modeling for personalized speech recognizers on mobile devices
Voice access of cloud applications including social networks using mobile devices becomes attractive today. And personalized speech recognizers over mobile devices become feasible because most mobile devices have only a single user. Speaking rate variation is known to be an important source of performance degradation for spontaneous speech recognition. Speaking rate is speaker dependent, it cha...
متن کاملSpeech Recognition Methods and their Potential for Dialogue Systems in Mobile Environments
The DaimlerChrysler speech recognizer is specialized for robust speech recognition in noisy environments, in particular for command and control applications. The recognizer that is used in cars has fixed grammars, which restrict the speaker to using short commands. This paper presents methods that allow the user to speak more freely and add spontaneous words to the commands: language modelling,...
متن کاملImpact of Vocal Tract Length Normalization on the Speech Recognition Performance of an English Vowel Phoneme Recognizer for the Recognition of Children Voices
Differences in human vocal tract lengths can cause inter speaker acoustic variability in speech signals spoken by different speakers for the same textual version and due to these variations, the robustness of a speaker independent (SI) speech recognition system is affected. Speaker normalization using vocal tract length normalization (VTLN) is an effective approach to reduce the affect of these...
متن کاملModel-combination-based acoustic mapping
We propose a new method for compensating distortions in the speech signal caused by environment changes. The basic method concentrates on additive noise, but can be extended to address also channel and to some extend speaker changes. By combining compensation with adaptation techniques it leads to high error rate reductions for mobile speech applications. Thereby, it is more efficient than adap...
متن کامل